Main

Understanding human embryogenesis is central to treating congenital diseases and infertility as well as creating functional human cells and organs for transplantation. Immediately after implantation, the embryo and codeveloping extra-embryonic tissues are profoundly remodelled and initiate morphological changes central to the success of pregnancy, including the formation of the amniotic cavity and emergence of yolk sac haematopoiesis1,2. However, early post-implantation stages of human development are difficult to study due to limited access, technical complexities and ethical concerns3,4,5. Furthermore, with its flat bilaminar disc structure, the early post-implantation stage human embryo morphologically differs from its mouse counterpart. In vitro human stem cell-based embryo models or human embryoids have recently emerged, opening tremendous biomedical opportunities to study this stage of human life in detail6,7,8,9,10,11,12,13,14,15,16. However, current models each have their own drawbacks, such as instability in post-implantation stage progression, lack of crucial extra-embryonic layers, low efficiency and technical challenges, including reliance on complex culture conditions that may not effectively support a wide range of cell types with optimal efficiency. Furthermore, none of the models have been able to reconstitute the intricate process of early human haematopoiesis, a critical event affecting both embryo viability and adult human health17.

To overcome some of these challenges, we present a human embryoid model containing extra-embryonic niche and yolk sac haematopoiesis dubbed ‘heX-embryoid’. This platform leverages an approach that genetically engineers a population of human-induced pluripotent stem (hiPS) cells. On induction, an extra-embryonic hypoblast-like niche is formed that triggers three-dimensional (3D) self-organization of the epiblast-like cells. heX-embryoid is similar to a flattened yolk sac cavity surrounding an epiblast–hypoblast interface and amniotic cavity (Extended Data Fig. 1a). It shows the development of a human bilaminar disc structure, specification of anterior hypoblast-like cells, lumenogenesis, formation of amnion-like tissue and symmetry breaking leading to posterior pole specification. Its extra-embryonic-like tissue shows maturation towards yolk sac tissue identity and the emergence of possible haematopoietic niches together with different waves of blood-like cells. Our platform lacks any trophoblast layer or analogue; hence, it offers a ‘non-integrated’ model of human embryogenesis with extra-embryonic haematopoiesis, at an early implantation stage when the embryo in vivo is at its most inaccessible state.

Epiblast and hypoblast codevelopment

We engineered an hiPS cell line with an inducible transgene that expresses human GATA6 (iGATA6), a key transcription factor for extra-embryonic endodermal fate decision (Fig. 1a and Extended Data Fig. 1b). These iGATA6 cells express different amounts of GATA6 on the addition of a small molecule, doxycycline (Dox), distinguishable by expression of the reporter enhanced green fluorescent protein (EGFP) (Extended Data Fig. 1c). When iGATA6 and wild-type (WT) hiPS cells were co-assembled in 3D, they yielded aggregates in which the entire EGFP+ extra-embryonic endoderm-like cells (outer layer) were in close contact with WT epiblast-like cells (inner layer), missing parts analogous to yolk sac tissue that was formed away from the epiblast cells (parietal). Furthermore, we found that WT cells could not consistently induce symmetry breaking to amnion- and epiblast-like domains and the overall organization did not recapitulate the structural format of the bilaminar disc (Extended Data Fig. 1d,e). Moreover, in vitro reattachment of 3D embryo-like structures to a dish to mimic implantation has shown organizational instability that limits developmental potential6.

Fig. 1: Engineering codevelopment of embryonic and extra-embryonic endoderm tissues.
figure 1

a, Schematic demonstrating cell mixing and subsequent organization of iGATA6 (green) and WT (orange) hiPS cells after GATA6 induction by Dox. b, IF staining demonstrating self-organization of heX-embryoids in the cultures. c, Live time-lapse images of one embryoid showing growth of a central lumen from a rosette indicated by an arrow. d, Phase and fluorescence images from live day 4 cultures showing developed heX-embryoid morphologies with EGFP-expressing iGATA6 around a WT cluster. e, Quantification of characteristics of WT clusters possessing different iGATA6 coverage and lumen formation characteristics. No islands without coverage were observed possessing a lumen (±0%). n represents the number of heX-embryoid assessed from at least two biological replicates. f, Single-cell UMAP and hypergeometric statistical comparisons of differentially expressed gene (DEG) lists between day 2 embryoids and human and cynomolgus monkey embryo single-cell datasets. Labels above heatmaps indicate pre-implantation versus post-implantation embryo sample comparisons; grey bars under labels indicate cynomolgus monkey comparisons. Leading letters indicate the source dataset: T from ref. 23, X from ref. 24, M from ref. 26 and XC from refs. 24,25. g, Single-cell UMAP and hypergeometric statistical comparisons of differentially expressed gene lists between day 4 embryoids and human and cynomolgus monkey embryo single-cell datasets. Black boxes highlight high DEG similarity (low P values) to relevant populations of interest. YSE, yolk sac endoderm; YSM, yolk sac mesoderm. For hypergeometric tests, we performed Benjamini–Hochberg correction for multiple tests in all comparisons. Tests were performed as one-sided tests. Scale bar, 50 µm (c), 100 µm (d), 1,000 µm (b). Illustration in a was created using BioRender (https://biorender.com).

Source Data

To address these challenges, we aimed to develop a strategy to generate structures off of the cell culture plate by igniting two-dimensional (2D)-to-3D self-organization. The iGATA6-hiPS cells were mixed with WT hiPS cells, and the mixed population of cells was seeded onto standard culture plates at a defined cell ratio and density (Fig. 1a). After treatment with Dox, the iGATA6 cells upregulate GATA6 and EGFP as expected, and show loss of the pluripotency marker NANOG (Extended Data Fig. 1f). During the first 48 h, from an initial state of random distribution, induced iGATA6 cells and WT cells proliferate, and WT cells are organized into disc-shaped clusters confined by iGATA6 cells (Supplementary Video 1, Fig. 1b and Extended Data Fig. 1a,g,h). Subsequently, the migration of the iGATA6 cells over the upper surface of these WT clusters occurs in conjunction with 3D growth within the WT disc (Extended Data Fig. 1i and Supplementary Video 2). This behaviour was observed when the system was maintained on mTeSR (Extended Data Fig. 1j). These events result in a membrane of extra-embryonic endoderm-like cells that express GATA4 and SOX17 but lack OCT4 over a multilayer WT structure (Fig. 1a,c,d and Extended Data Fig. 1i). The process of upwards iGATA6 migration occurs in parallel with the deposition of a laminin membrane surrounding the WT clusters (Extended Data Fig. 2a,b). The laminin deposition by the iGATA6 cells can subsequently trigger cell polarization within cells of the WT cluster, which promotes the formation of cellular rosettes central to the WT clusters, resembling embryonic morphology around Carnegie Stage (CS)5a (Fig. 1c and Extended Data Fig. 2a). We observe the conversion of these rosettes into lumens, mimicking the expansion of the pro-amniotic cavity around CS5b–CS5c of development (Fig. 1c, Extended Data Fig. 2a and Supplementary Videos 3 and 4)18. This is facilitated by expression of PODXL, an apically expressed antiadhesive surface protein implicated in embryonic lumen formation, and ZO-1, an apical tight junction protein (Extended Data Fig. 2d,e)19. This laminin deposition and lumen formation is not observed in WT-only culture in equivalent conditions (Extended Data Fig. 2f). We noticed that when an iGATA6 layer did not surround the WT clusters, lumen formation did not occur (Fig. 1e). The number of clusters with no coverage by an iGATA6 layer progressively decreases by day 5 as the migration of iGATA6 cells moves forward (Fig. 1e). Ultimately, following expansion of the lumen, the two cell monolayers, composed of epiblast-like WT (NANOG+) and endoderm-like GATA6+ cells, are positioned on either side of a laminin membrane (Fig. 1a,c and Extended Data Fig. 2d), forming a cellular arrangement similar to the bilaminar disc in the early post-implantation human embryo20. Analysis of the culture media showed high amounts of secreted AFP and APOA1, two proteins known to be produced by primitive endoderm and yolk sac tissues (Extended Data Fig. 2g).

During our initial analysis, we noticed that at day 5, in WT clusters with higher area and often lower circularity, cavity formation occurred by several lumens (Extended Data Fig. 2h,i). To optimize cluster size and number, we tested a range of seeding ratio and densities for iGATA6 and WT cells, to achieve cluster area most consistently within a size range corresponding to the E9–E17 human bilaminar disc (Extended Data Fig. 2j) (Carnegie nos. 8004 and 7700 in ref. 21, Carnegie no. 7801 in ref. 22). To this end, we selected a seeding ratio of 81/5 iGATA6/WT at a density of 54,000 cells per cm2 (Extended Data Fig. 2j, dotted boxes). When we applied this optimized seeding density, we could detect 74% of embryoids in the expected physiological size range (Extended Data Fig. 2k). We also observed most (70.9%) of our embryoids in this size range produced a single lumen (Extended Data Fig. 2k). Thus, by adjusting these initial parameters, we could gain control over size, number, circularity and lumen formation.

Intra- and/or extra-embryonic scRNA trajectories

To uncover the cell types that develop during the observed morphogenetic events, we conducted single-cell RNA sequencing (scRNA-seq) from day 0 to day 5 of cultures (Extended Data Figs. 35). Performing an unsupervised comparison of the transcriptomes with the E16–19 human embryo23, E6–E14 human embryo samples24,25 and the E18–20 cynomolgus embryo26 revealed the acquisition of distinct post-implantation-like cellular fates in our identified subpopulations (Fig. 1f,g and Extended Data Fig. 4a–c). As early as 24 h (D1) after induction, cultures showed the emergence of cell populations (D1_G1, G2, G3) with transcriptomic similarity to human hypoblast, alongside a corresponding epiblast-like population (D1_W1) (Extended Data Fig. 4a–c). By 36 h (D1.5), iGATA6 cells become statistically similar to the human E12 hypoblast lineage and this similarity strengthens through to day 5 of culture (Extended Data Fig. 4b)24. Comparison with the E16–19 human and cynomolgus embryo shows significant similarity to yolk sac endoderm and hypoblast, appearing at D1 and strengthening to day 5 (P = 6.00 × 10−54 to 2.84 × 10−93 similarity significance to yolk sac (YS) Endoderm at day 5) (Fig. 1f and Extended Data Fig. 4a,c)23,26.

On day 2, there is significant statistical similarity to yolk sac endoderm within the iGATA6 population (clusters D2_G1, D2_G2, D2_G3, D2_G4; P = 1.3 × 10−17 to 9.5 × 10−69), with the highest GATA6-expressing cells (cluster D2_G5) diverging from this fate towards a more yolk sac mesoderm-like state (P = 6.8 × 10−11) (Fig. 1f and Extended Data Fig. 4a–c). By day 4, we observed the strengthening of the identities and of the observed fate divergence. The lower GATA6 clusters (D4_G1 and D4_G2) showed increased expression of key yolk sac marker genes (GATA4, PDGFRA, LAMA1, CUBN, AMN, NODAL) and statistical similarity to human yolk sac endoderm (Fig. 1g and Extended Data Figs. 3b and 4a–c, PG1 = 1.3 × 10−92, PG2 = 1.8 × 10−63)23,27. Each of the day 4 putative iGATA6 endoderm was also statistically compared to the differentially expressed gene lists of yolk sac endoderm and proliferating definitive endoderm from the E16–19 human embryo. The lower GATA6 clusters D4_G1 and D4_G2 show a higher number of genes in common with yolk sac, including apolipoprotein genes, PDGFRA and CUBN (Extended Data Fig. 3c)23, which is corroborated with the differential Jaccard similarity to yolk sac endoderm between these populations (Extended Data Fig. 3d).

We next assessed other developed populations in the iGATA6 layer. We observed that a ‘medium’ GATA6 population (D4_G3 and D4_G4) had a strong anterior hypoblast-like identity (homologous to mouse anterior visceral endoderm, with specific upregulation of anterior hypoblast markers (LHX1, HHEX, FZD5, CER1, LEFTY1) (Extended Data Fig. 3b)23,28,29,30. It showed lower statistical similarity to hypoblast and/or yolk sac endoderm compared with low expressing GATA6 populations (Fig. 1g and Extended Data Fig. 4). This population shows a higher number of differentially upregulated genes in common with the definitive endoderm, which corroborates a recent study aligning human anterior endoderm more closely to a definitive endoderm trajectory (Extended Data Fig. 3c,d)31. We observed that the highest GATA6 clusters, D4_G5 and D4_G6, demonstrated statistical similarity to human yolk sac mesoderm with very subtle to no equivalent similarity to emergent, nascent and axial mesoderm (Fig. 1g and Extended Data Fig. 4a,b, PG5 = 1.7 × 10−22, PG6 = 2.5 × 10−25). Our analysis revealed that the signature for extra-embryonic mesoderm emerged as early as 12 h after the system’s induction in the highest GATA6 population (Extended Data Fig. 4a). Taken together, our data demonstrate a divergence in tissue fates similar to those in the yolk sac on the basis of GATA6 expression amounts, with yolk sac endoderm-, anterior hypoblast- and yolk sac mesoderm-like cells stratifying between populations expressing low, medium and high average amounts of GATA6 (Extended Data Fig. 4 top row).

Recently, observations in the cynomolgus monkey by Zhai et al. indicate a similar stratification of GATA6 concentrations in a variety subset of tissues, notably between yolk sac endoderm (lower GATA6) and mesoderm (higher GATA6) populations32. Within the WT clusters, identified by a lack of GATA6 or EGFP transcripts, we observe the divergence of three separate populations from the epiblast-like compartment. The first and largest cluster (D4_W1) maintained the similarity to human epiblast, with expression of the pluripotency markers POU5F1 (OCT4), SOX2 and NANOG (Extended Data Figs. 3b and 4a, PW1 = 1.15 × 10−49). The second (D4_W2) WT cluster had lower statistical similarity to human epiblast (PW2 = 3.4 × 10−20 to D16–19 Epi), but stronger similarity to human primitive streak (PW2 = 3.6 × 10−95 to D16 PrS) (Fig. 1g and Extended Data Figs. 3b and 4a). Our assessment of the third WT cluster at this time point (D4_W3) showed the occurrence of a unique transcriptomic similarity to human and cynomolgus amnion (PW3 = 8.0×10−15 and 1.6 × 10−25, respectively) (Fig. 1g and Extended Data Figs. 3b and 4a,c)33. We did not observe a similarity to human trophoblast lineages (Extended Data Fig. 4b). When we integrated all clusters across different days, iGATA6 and WT lineages occupied distinct compartments within the projection space. However, we did observe spatial positioning of populations aligned with their time of sampling (Extended Data Fig. 5a). Populations sampled at different times can show similar fates (that is, day 3 versus day 5 iGATA6 yolk sac endoderm-like cells) (Extended Data Fig. 5b). The top-upregulated genes show distinct patterns across different cell fates, with overlapping genes present across similar fates sampled at different time points (Extended Data Fig. 5c).

Specification of amniotic ectoderm

Primate amniogenesis differs structurally and temporally from mice, emphasizing the need for human model systems33. During the early post-implantation phase, following lumenogenesis in the epiblast, the epiblast cells undergo dorsal–ventral patterning to separate the epiblast from amniotic ectoderm34. Our single-cell transcriptomic analysis in WT cells showed a group of cells with transcriptomic similarity to human and cynomolgus amnion that emerged around day 3 (Fig. 2a,b and Extended Data Figs. 4a and 5b). These cells express amnion markers, including ISL1, TFAP2A (AP-2α) and GATA3, maintaining expression of OCT4 while substantially lacking expression of NANOG (Fig. 2a,b)16,33. Our analysis through immunofluorescent (IF) staining corroborated this finding, demonstrating a layer of ISL1+AP-2α+ in heX-embryoids (Fig. 2c–e and Extended Data Fig. 6a). We observed that these cells were spatially segregated within the cavitated sac-like structure, with the amnion-like lineage positioning away from the bilaminar iGATA6/WT disc, forming a membrane against the tissue culture dish. WT cells above, within the bilaminar iGATA6/WT disc, showed expression of NANOG (Fig. 2c–e and Supplementary Video 5). We found that all embryoids containing cavitated structures (100%, n = 350) expressed ISL1 at a level high above the background signal (Fig. 2f).

Fig. 2: Amniotic cavity formation and expansion.
figure 2

a, Merged UMAP of all WT lineages from the embryoids labelled by day of development. b, The merged WT population showing the compartment expressing markers of amnion. ISL1, TFAP2A (AP-2α) and GATA3 are expressed in this area, whereas NANOG is negative and OCT4 is low. c, An orthogonal slice of an individual embryoid showing top to bottom compartmentalization of ISL1+ and NANOG+ cells. iGATA6 (white) cells cover the top. d, Horizontal z-slices at the indicated distance from the dish of the WT cluster from c. e, Schematic showing the position of each population within a single heX-embryoid. Dotted lines indicate the area of slices shown. f, Violin plot showing the expression of ISL1 in day 4 embryoids containing lumens (n = 350). The dotted line indicates a negative threshold for ISL1. g, Expression patterns of BMP4 effectors (phosphorylated SMAD1, SMAD5 and SMAD8/9). Lower images show a lateral slice of the WT disc. h, IF staining for the BMP4 effectors (phosphorylated SMAD1, SMAD5 and SMAD8/9) at day 4 of cultures after application of the inhibitor Noggin at day 2 of development. Lower images show a lateral slice of the WT disc. i, IF staining for ISL1 and NANOG at day 4 after application of Noggin at day 2 of development. Lower images show a lateral slice of the WT disc. Scale bars, 100 μm. n represents embryoid structures harvested from three independent experiments. a.u., arbitrary units.

Source Data

Further IF staining for phosphorylated (p)SMAD1, pSMAD5 and pSMAD8/9 (BMP4 effectors) revealed BMP4 signal transduction in a ring pattern concentrated at the edges of the embryoid (Fig. 2g and Extended Data Fig. 6b). scRNA-seq at day 4 also showed specific upregulation of BMP4 and its targets within the amnion-like population (Extended Data Fig. 6c). Recently, the active role of primate amniotic tissue to promote BMP4 signalling has been suggested, acting downstream of ISL1 (ref. 33). It was also reported that BMP4 can promote the formation of ISL1+ amniotic tissue in a human embryo model16,33. Hence, to probe the link between BMP4 signalling and amnion fate in heX-embryoid, we aimed to inhibit BMP signalling by treatment with Noggin. We observed effective suppression of SMAD1/5/8 phosphorylation (Fig. 2h and Extended Data Fig. 6d), and a marked decrease in amnion fate acquisition assessed by ISL1 expression (Fig. 2i and Extended Data Fig. 6e,f). Collectively, we conclude that our platform shows robust amniogenesis with formation of a dorsal–ventral axis that is dependent on BMP4 signalling, supporting the use of these embryoids for studying this key developmental event in humans.

Anterior hypoblast and posterior domain

In vivo evidence shows a group of cells with anterior visceral endoderm characteristics in human and non-human primates that have been termed anterior hypoblast29. In mice, this tissue influences the posterior axis specification through the expression of signalling inhibitors, including CER1. Our scRNA-seq analysis of iGATA6 cells as early as day 2 revealed a cluster of cells that show markers associated with anterior hypoblast. These cells are positive for CER1, LHX1, HHEX, LEFTY1 and LEFTY2 (Fig. 3a)29. A population expressing these markers continued to be present in the following days (days 3 to 5) (Fig. 3a) but showed a distinct cluster from day 2, primarily related to cell cycle-related pathways (Extended Data Fig. 5d). Moreover, the analysis detected the emergence of a distinct cluster of cells at day 3 and 4 in WT cells presenting TBXT (Brachyury), MIXL1, GDF3 and EOMES, key markers associated with development of the posterior pole in the embryo (Fig. 3b)35.

Fig. 3: Anterior hypoblast domain and posterior pole in heX-embryoids.
figure 3

a, Merged UMAP of all iGATA6 lineages showing the compartments expressing markers of anterior hypoblast. Two separate domains of these markers were observed, one within day 2 and one within day 3–5 of the embryoid cells. (UMAP labelled by day can be seen in Extended Data Fig. 5). b, Merged UMAP of all WT lineages showing the compartment expressing markers of the posterior pole and primitive streak. c, IF staining showing a z-slice of a heX-embryoid with a HHEX/LHX1/CER1 copositive domain adjacent to a WT cluster. The dotted line indicates the boundaries of the WT cluster in each image, and the arrow indicates the copositive domain. d, IF staining showing a TBXT/MIXL1 copositive domain within the WT clusters of the embryoid. e, Examples of WT clusters with polarized TBXT/MIXL1 domains (top) or without co-expression of markers (bottom). The pie chart shows proportions of cluster types observed in the embryoid cultures, n = 746 total. f, A pie chart showing the proportion of clusters with a particular TBXT/CER1 polarity type when CER1 confined to one pole is present. n = 169 total. g, Representative WT clusters showing TBXT and CER1 expression domains at opposing poles of the cluster (anti-polar, top) or at the same pole of the cluster (syn-polar, bottom). h, Diagrams indicating the average radial expression patterns of TBXT in each polarity pattern from WT clusters with a polarized CER1-expressing domain indicated in f. All diagrams are scaled to the same expression intensity value (1 a.u.). Degrees indicate radial distance around the circularized perimeter of a WT cluster from the CER1 peak shown. Shaded areas indicate region of highest average TBXT polarity corresponding to the polarity types. Scale bars, 100 µm. n represents the embryoid structures from three to four separate experiments harvested on day 4 after induction.

Source Data

Following up with IF staining, we identified domains of cells expressing anterior hypoblast-related proteins (LHX1, HHEX) in a subset of extra-embryonic cells surrounding the WT clusters as early as day 2 (Extended Data Fig. 6g). On day 4, CER1 is observed to be expressed in a polar manner within the LHX1+/HHEX+ domains (Fig. 3c). Out of 396 embryoids assessed, we identified 42.4 ± 4.1% (mean ± s.d. 169) embryoids with polar CER1-expressing domains on day 4, showing a similar efficiency to existing ex vivo human blastocyst cultures11,29. In parallel, we showed development of TBXT+ domains in 46.4 ± 9.0% (323 out of 746) of WT clusters on day 4. 30.8 ± 5.0% (mean ± s.d.; 215) of these 746 clusters showed asymmetric (polar) TBXT expression (Fig. 3d,e and Supplementary Video 6). Treatment with Noggin could eliminate TBXT+ poles, whereas we still observed the specification of CER1-expressing cells (Extended Data Fig. 6h,i). Although this observation warrants further mechanistic investigation in the future, it highlights the role of BMP4 from amnion-like structures in posterior domain specification in this model, corroborating with past reports36.

We then analysed our embryoids to evaluate anterior–posterior axis positionings. On day 4, we observed that 61.2 ± 8.6% (mean ± s.d.; 102 of 169) of embryoids with polar configuration in CER1-expressing cells possessed a TBXT-expressing pole (Fig. 3f). Within embryoids with polarity in both CER1 and TBXT we observed two distinct patterns (Fig. 3f,g and Extended Data Fig. 6j,k). 41.2 ± 8.5% show an anti-polar configuration, in which the TBXT+ pole occupies the opposite radial region as the CER1+ domain, similar to anterior–posterior axis positioning in mouse at E6.5 (Fig. 3f–h and Extended Data Fig. 6k). In a second set of polar embryoids, we saw high expression of CER1 in the proximity of TBXT+ domains (Fig. 3f–h and Extended Data Fig. 6j). This set of embryoids shows CER1+ and TBXT+ poles with a same radial region (syn-polar), in contrast to past reported literature.

We also noted expression of CER1 in a subset of TBXT+ cells in the posterior domains in some cases (Extended Data Fig. 6j,k) and subsequent scRNA-seq analysis showed the presence of a subpopulation of CER1+TBXT+ cells in the cluster that represents posterior identity (Extended Data Fig. 6l). Co-expression of CER1 and TBXT may be reflective of a more advanced stage post-E14 in which CER1 expression acts either to prevent spreading of posterior domain37 or represent early mesendoderm commitment derived from a primitive streak32. Further research is needed to explore the in vivo relevance of co-expressing cells and the observed non-canonical polarization types (syn-polar) in the human context. In summary, we propose heX-embryoids offer a model to study anterior–posterior axis biology, the development of anterior hypoblast and posterior poles, and their respective cellular progeny in a human context.

Yolk sac mesoderm and blood progenitors

During human development, yolk sac serves dual roles as a nutritional source and a site for early blood production, beginning around E16–18 (refs. 1,38,39). We detected developmental and functional maturation of cells in the iGATA6 layer aligned with human yolk sac endoderm and mesoderm fates (Fig. 1g and Extended Data Figs. 2f and 4a). Hypergeometric statistical analysis of the clusters with highest expression of GATA6 on day 4 (D4_G6) and 5 (D5_G3) reveals significant similarity to human yolk sac mesoderm, haematopoietic lineage progenitors and haemogenic endothelial cells using the annotation of the E16–19 human embryo. However, the haemogenic endothelial nature of this cluster cannot be determined without further analysis. (Extended Data Fig. 7a,b)23. Examination of the D5_G3 cluster reveals higher expression of the extra-embryonic mesoderm cell surface marker BST2, as well as of ECM proteins (Extended Data Fig. 7a)39,40. Subpopulations of cells within this cluster also show a higher average expression markers of human yolk sac mesoderm (CREB3L1, NR2F2, PLAGL1, ANXA1, NID2)39,40, markers of endothelial cells (CD34, PECAM1 (CD31), TEK, ICAM1, PLVAP)23,41 and markers of human haematopoiesis (GATA1, GATA2, KLF1, KLF2, ZEB1, ZEB2, GYPA (CD235a) and GYPB (CD235b)) (Fig. 4a–c)23,39,41.

Fig. 4: Haematopoietic lineages and haematopoietic foci structures in the heX-embryoids.
figure 4

a, Dot plot showing the expression pattern of yolk sac mesoderm markers in day 5 embryoid scRNA-seq populations. b, Dot plot showing the expression pattern of endothelial markers in day 5 embryoid scRNA-seq populations. c, Dot plot showing the expression pattern of haematopoietic markers in day 5 embryoid scRNA-seq populations. d, IF image of day 12 embryoid showing the generation of CD43+ spherical cells within CD31+ endothelial cells. e, Live phase image taken on day 12 showing cells spherical cells. The dotted box indicates area of the inset. f, IF image of a day 12 culture after GATA6-hi cells were added into the starting cell mix. g, Bar plot showing the number of CD43+ cells detected on day 12 after the given percentage of GATA6-hi cells were supplemented. n(0%) = 4 replicates, n(10%) = 2 replicates, n(25%) = 2 replicates. Error bars represent mean ± s.e.m. h, Two slices from a single representative structure demonstrating Desmin+ mesoderm-like cells and CD34+ endothelial-like cells localized underneath a FOXA2+ endoderm-like layer, reminiscent of yolk sac blood island morphology. i, Histogram showing the z-distribution of the indicated markers between the bottom of the dish and the top of the culture. Bimodal distribution of FOXA2 indicates areas of expression outside the haematopoietic foci. Distributions are averaged from nine foci from three technical replicates. j, 3D reconstruction of one haematopoietic focus showing the positioning of endoderm (FOXA2), endothelial (CD34) and mesoderm (Desmin) marker expression. k, Schematic depicting in vivo embryonic yolk sac blood islands and in vitro haematopoietic foci structures. Scale bars, 100 µm. Illustration in k was created using BioRender (https://biorender.com).

Source Data

IF analysis on day 5 initially reveals spindle-shaped cells positive for TAL1 (scl), a key regulator of blood development. Further analysis shows a subset of these cells were copositive for CD34 and ERG, markers associated with haematopoietic and endothelial fates; we also identified a small population copositive for TAL1 and RUNX1, itself another master transcription factor in haematopoiesis (Extended Data Fig. 7c–e)41. Image analysis shows ERG+ endothelial-like cells consistently emerged underneath an iGATA6 layer (Extended Data Fig. 7f), resembling the initial in vivo localization of human yolk sac mesoderm, where it is positioned against ECM protein in the area underneath the yolk sac endoderm39. The CD34+TAL1+ cells were also EGFP+, indicating their development from an iGATA6-derived parental population (Extended Data Fig. 7d). Staining for the endothelial markers KDR (VEGFR2) and CDH5 confirmed similar localization in day 5 embryoids (Extended Data Fig. 7g). To further characterize the development of these endothelial-like cells with haematopoietic characteristics, we followed these cultures beyond day 5. We observed minimal expansion or differentiation of the CD34+ endothelium in mTeSR media (Extended Data Fig. 7h). Switching to Iscove’s modified Dulbecco’s medium (IMDM) basal media without Dox after day 5 resulted in a notable expansion of the CD34+ endothelial-like cells by day 12 (Extended Data Fig. 7h,i). We also observed the emergence of condensed areas containing spherical cells in the yolk sac tissue-like compartment by day 12 (Fig. 4d,e). IF staining of these areas revealed the presence of a CD31+ endothelial layer surrounding spherical cells expressing Leukosialin (CD43), a marker of haematopoietic progenitors (Fig. 4d and Supplementary Video 7)42. Having noted that yolk sac mesodermal-like and haemogenic progenitor-like cells were predominantly associated with the GATA6-hi cluster in scRNA-seq analysis, we further enriched heX-embryoid cultures at the time of seeding with cells with a high copy number of the inducible GATA6 circuit. This supplementation resulted in a notable increase in the generation of CD43+ haematopoietic-like foci (Fig. 4f,g and Extended Data Fig. 8a,b), demonstrating the ability to predictably and controllably enhance the final culture phenotype at the time of seeding.

Yolk sac-like haematopoiesis

Haematopoietic processes emerge in vivo in the form of blood islands within yolk sac tissues. Structural image analysis of CD43+ foci in heX-embryoids revealed an intricate 3D hierarchical organization. CD34+ endothelial vessel-like structures were packed between FOXA2+ endoderm-like cells occupying the top compartment, while Desmin+ mesoderm-like cells formed the basal layer (Fig. 4h–j, Extended Data Fig. 8c,d and Supplementary Video 8). This cellular arrangement resembles the reported morphology of blood islands in the developing human yolk sac at roughly E19–23 (Fig. 4k)39.

Further analysis showed the specification of cells expressing markers for erythroid (CD235a and haemoglobin), myeloid and/or macrophage (CD33 and CX3CR1) and megakaryocyte lineages (CD42b and CD41) (Fig. 5a–c). These cells are specified in multilineage or uni-lineage foci within the yolk sac-like tissue. We show foci containing both erythroid-like and megakaryocyte-like cells or foci with myeloid-like and megakaryocyte-like cells. Most haematopoietic foci observed were multilineage, with a smaller fraction being uni-lineage (for example, CD42b+ uni-lineage foci; roughly 87.2–95.0% multi- versus 8.8–5.0% uni-lineage among all foci) (Fig. 5d,e and Extended Data Fig. 9a–d). Next, we examined the haematopoietic-like lineages that emerge beyond the second week of culture. A colony-forming unit (CFU) assay initiated from day 21 embryoids confirmed haematopoietic potential and showed generation of both erythroid-like and myeloid-like colonies (Fig. 5f). We then performed scRNA-seq on day 21 of the embryoids and compared these data against the E16–19 human embryo dataset23. We were able to identify cell populations with significant similarity to endothelial cells, erythroblasts, myeloid and/or erythro-myeloid progenitors and blood progenitors within this dataset (Fig. 5g and Extended Data Fig. 9e). To further interrogate the presence of a yolk sac-like haematopoiesis, we also investigated transcript expression for HOX genes. The HOXA family gene expression is not found in the haematopoietic wave preceding the emergence of aorta–gonad–mesonephros haematopoietic stem cells43. By contrast, HOXB7 and HOXB9 can be detected in haematopoietic cells during this phase. We show that endothelial-like and haematopoietic-like progenitors in heX-embryoid lack medial HOXA assessed by HOXA5, HOXA7 and HOXA9 while expressing posterior HOXB5, HOXB7 and HOXB9 genes (Extended Data Fig. 9f)43,44. These cells also show expression of embryonic genes LIN28A, GAD1 and FGF23 that are shown to be correlated with early waves of haematopoiesis and prehaematopoietic stem cell haematoendothelial programmes (before CS10-11) (Extended Data Fig. 9g)44. Furthermore, the expression of HBE1 and HBZ in our erythroid-like cells demonstrate an expression pattern unique to yolk sac erythropoiesis45. Collectively, we conclude that the de novo haematopoiesis-like process in heX-embryoids demonstrates yolk sac characteristics.

Fig. 5: Haematopoietic programme characterization in heX-embryoids.
figure 5

a, IF of day 12 erythroid-like progenitors. b, IF of day 12 myeloid-like progenitors. c, IF of day 12 megakaryocyte-like progenitors. d, IF of day 17 multilineage haematopoietic foci. e, IF of day 17 uni-lineage haematopoietic foci. f, Results of a CFU methylcellulose assay seeded from day 21 cultures. Counts from n = 3 biological replicates shown. Example colonies are shown. CFU-E, colony forming unit-erythroid; BFU-E, burst forming unit-erythroid; GM, colony forming unit-granulocyte, macrophage; GEMM, colony forming unit-granulocyte, erythrocyte, macrophage, megakaryocyte. g, UMAP of day 21 populations. Boxed area shows populations with similarity to in vivo haematopoietic lineages. Hypergeometric statistical comparison to ref. 23 for these clusters is shown. For hypergeometric tests, we performed the Benjamini–Hochberg correction for multiple comparisons in all comparisons. Tests were one-sided. h, IF of day 12 and 21 cultures, showing representative erythroid-like colonies with primitive (high Hb ε, low Hb γ) characteristics on day 12 and definitive (high Hb γ, high Hb ε) characteristics on day 21. i, qPCR showing change HBE and HBG expression ratio across the indicated days. **P = 0.0012 (confidence interval 95%). P was calculated using one-way ANOVA with Tukey’s multiple comparisons test between days 12 and 21. n = 3 biological replicates sampled per day. j, Flow cytometry scatterplots showing erythroid- (CD235ab+), myeloid- (CD33+) and megakaryocyte-like (CD42b+) populations within the CD43+ population of day 21 cultures. Bar plots show cells identified within the CD43+ population in eight biological replicates. k, Flow cytometry analysis of week 3 culture for neutrophil-like cells, pregated for CD45+ cells. The bar plot shows the percentage of CD31+CD15+ cells in three biological replicates. l, Flow cytometry analysis of week 2 culture for lymphoid-like progeny, pregated for CD45+CD117+ cells. The bar plot shows the percentage of CD117+CD43+CD7+ cells in three biological replicates. m, Flow cytometry analysis of week 3 culture for natural killer-like cells, pregated for CD45+VLA-4+ cells. The bar plot shows the percentage of CD56+VLA-4+ cells in four biological replicates. Error bars are ±s.e.m. Scale bars, 100 µm.

Source Data

Cell composition of haematopoietic waves

Yolk sac haematopoiesis comprises two waves: a primitive wave begins at CS7 (week 2.5 postfertilization), generating early erythroid, myeloid and megakaryocyte progenitors, and a subsequent definitive wave starts at roughly CS8–9 (week 3.25 postfertilization) comprising erythro-myeloid (EMP) and lymphoid-primed multipotent progenitors programmes46,47. We investigated these haematopoietic waves in our platform.

Primitive erythroid progenitors show high expression of embryonic globin (Hb ε) and low expression of fetal globin (Hb γ). As the embryonic erythroid programme shifts to produce EMP-derived erythroid cells, the change results in an inversion of this pattern. This inversion is reflected in a decline in the Hb ε to Hb γ ratio, resulting in a nearly even average ratio of expression by 4 weeks of gestation45,48. IF staining of heX-embryoid cultures revealed that Hb-expressing cells in week 2 embryoids (days 12 and 15) show high Hb ε and little to no expression of Hb γ. However, Hb-expressing cells at day 21 show an increase in Hb γ concentration (Fig. 5h). We assayed the culture at different time points using both image analysis of the individual cells and quantitative PCR with reverse transcription (RT–qPCR) of whole cultures. Both analyses showed a decline in the ratio of HBE to HBG from day 12 to 21 (Fig. 5i, Extended Data Fig. 10a–c and Supplementary Table 1). The temporal switch in the erythroid-like cells potentially implies a primitive-to-EMP-like transition in erythropoiesis. However, the maturation of primitive erythrocytes alone may also involve a globin switch49, which warrants further investigation.

As EMP haematopoiesis results from the transition of haemogenic endothelial cells to haematopoietic progenitors, we next investigated the presence of a haemogenic endothelial signature within the endothelial population. Examination of the endothelial-like cell cluster in scRNA-seq identified a subcluster expressing genes matching those reported in haemogenic endothelial cells50. This population showed canonical endothelial markers (CD34, CDH5, ERG), lacked the expression of NT5E (CD73), a marker of non-haemogenic endothelium, and expressed GATA2, an indispensable factor for the endothelial-haematopoietic transition (Extended Data Fig. 10d)51. Whereas RUNX1 transcripts are present in the haematopoietic clusters, we did not identify its expression in the endothelial cluster, probably because of its transient nature of upregulation in endothelial cells during the haematopoietic transition. We identified a small number of CDH5low+RUNX1+ITGA2B+ cells in haematopoietic-like clusters that are probably the immediate descendants of the haematopoietic transition from endothelial-like cells (Extended Data Fig. 10e)52. We also identified cells in the haematopoietic-like cluster co-expressing RUNX1 and CD44, a marker of definitive yolk sac haematopoietic progenitors (Extended Data Fig. 12f)53,54. Subsequent IF staining for RUNX1 on day 21 confirmed the specification of these VE-cad+RUNX1+ cells near VE-cad+ endothelium (Extended Data Fig. 10g). Hence, a subset of endothelial-like cells in our platform showed characteristics similar to the haemogenic endothelium.

Further analysis of our day 21 scRNA-seq revealed presence of erythroid- (GYPA (CD235a), HBZ, HBE1), megakaryocyte- (that is, GP1BA (CD42b), CD226, PLEK), macrophage-like (that is, CD33, CD68, CX3CR1) cells, confirming the cell types identified by CFU examination (Extended Data Fig. 10h). Flow cytometry analysis confirmed the presence of erythroid- (CD235ab+), myeloid- (CD33+) and megakaryocyte-like (CD42b+) cells within a CD43+ cell population during weeks 2 and 3. However, these lineages can arise from both primitive and EMP haematopoietic waves (Fig. 5j and Supplementary Fig. 1).

We then probed our heX-embryoids for cells similar to those generated during the definitive phase of yolk sac haematopoiesis but not the primitive. Transcriptomic analysis of the haematopoietic-like cluster shows cells co-expressing function-specific genes of neutrophils such as FUT4 (CD15), MPO and PRTN3 (Extended Data Fig. 10h). The neutrophil-like identity was confirmed in flow cytometry by identification of a population of CD15+CD31+CD45+ cells that have specified by the third week of culture (Fig. 5k and Supplementary Fig. 2)54,55,56.

Although the capacity of the human yolk sac to produce EMP lineages, with potency for erythroid, megakaryocyte, macrophage and granulocyte cell types has been shown, its ability to produce lymphoid progenitors from within the human yolk sac has only recently started to be suggested54,57,58. CD117 (KIT) and CD7 co-expression is prominent in human yolk sac lymphoid lineages54. We could identify markers associated with lymphoid progenies and innate lymphoid cell types (that is, CD7, IL7R, CD3D, NCAM1 (CD56), NKG7, CTSW, IL2RG). (Extended Data Fig. 10h). We detected cells expressing CD7+CD43+CD117+CD45+ as early as day 16 and VLA-4+CD56+CD45+ by the third week of culture by using flow cytometry analysis. These cells represent a putative early lymphoid signature and natural killer-like cells, respectively (Fig. 5l,m and Supplementary Fig. 2)56. Hence, extra-embryonic-like tissue of heX-embryoids show EMP- and lymphoid-primed multipotent progenitor-like haematopoietic programmes.

Discussion

heX-embryoid, an hiPS cell-based model of human embryogenesis, contains embryonic epiblast-like and extra-embryonic endoderm- and mesoderm-like components. The presence of these three tissues was sufficient to trigger complex cellular organization. In the first 4 days, heX-embryoids mimic the segregation of amnion from pluripotent epiblast cells, formation of an amniotic-like cavity, development of bilaminar disc-like structures, generation of anterior hypoblast-like cells and a posterior-like pole resembling events associated with CS5-7 of human development (Extended Data Fig. 11). heX-embryoids lack trophoblast, a prerequisite tissue for the formation of placenta. As such, they do not mirror the biology of the full integrated embryo and can provide a possibility for extra in vitro follow-ups with limited ethical concerns. In our analysis, WT clusters after day 5, showed characteristics mainly aligned with ectodermal fates, which is a subject for future studies and optimization (Extended Data Fig. 12a,b).

The emergence of early haematopoietic programmes has not been studied in fine detail in primates and is widely unexplored in the human embryo and its models. At 2–3 weeks, the de novo haematopoietic activity in heX-embryoids recapitulates the emergence of haematopoiesis in the yolk sac of the human embryo at CS7–9. The possibility to offer a multifaceted model for early blood-like development creates significant research prospects to produce hard-to-access populations of cells applicable to human cell therapies. The pattern of haemoglobin expression, the presence of yolk sac haemogenic endothelial-like progenitors and lineages derived from EMP haematopoiesis such as natural killer-like cells59, together with the existence of granulocyte- and lymphoid-like lineages, indicate that our platform can model primitive and definitive blood formation of the yolk sac. The presence of uni-lineage foci, albeit not the majority, hints at specialized niches directing the differentiation of haematopoietic progenitors. Bona fide yolk sac blood islands are formed by means of the emergence of a lumen resulting from the disappearance of mesodermal cells within areas bordered by early endothelial cells60. Although we have not been able to confirm the occurrence of this process in our cultures, the extra-embryonic-like tissue of heX-embryoids develops morphologically relevant haematopoietic foci that show in vivo-like organization of yolk sac tissue layers with extra-embryonic endoderm- and mesoderm-like components.

From an engineering perspective, our platform highlights the power of tissue niche engineering and codifferentiation. Its creation leverages self-organization from 2D to 3D that is directed by establishing tissue boundaries and geometric confinement, vertical tissue growth, cell migration and polarization and tissue cavitation. Hence, it supports a new way to build 3D structures anchored to a 2D surface, leading to improved robustness, efficiency and control. Engineering multilineage fates by the expression of transcription factors enables self-produced environmental cues (for example, laminin, BMP4) and alleviates the need for complex and often supraphysiologic regimes of growth factors.

The extra-embryonic-like differentiation of heX-embryoids is preprogrammed into undifferentiated hiPS cells by means of an inducible genetic switch and mixed with WT hiPS cells; hence, the undifferentiated cell mix can be expanded, cryostored and shipped for use on demand, requiring only 2D culture plates, commercially available medium and addition of a small molecule, Dox (Extended Data Fig. 12c–e). The compatibility with live imaging, the high-throughput format, efficient generation and an easy-to-implement protocol enable establishment and use across different laboratories. These characteristics, coupled with the potential to be produced from hiPS cells with diverse genetic backgrounds (Extended Data Fig. 12f–i), facilitate previously hard-to-execute studies on the early stages of post-implantation and emergence of the human haematopoietic programmes. As such, this model will provide new routes for drug testing, developmental toxicology, tractable disease modelling and the generation of cells for regenerative therapies in a human context.

Methods

Ethics about the development of heX-embryoids

This research was approved and performed under the oversight of the University of Pittsburgh Human Stem Cell Research Oversight Committee to generate human embryo models from human iPS cell lines (approval number PROTO202300020). The work that has been reported in this study followed 2016 Guidelines for Stem Cell Research and Clinical Translation as well as 2021 ISSCR Guidelines on Ethical Standards for Stem Cell Embryo Model3,5,61. heX-embryoids reported in this study are generated from human iPS cells that are derived from human somatic cells such as fibroblasts. The PGP1 and PGP9 hiPS cells were a kind gift from the Weiss Laboratory (Massachusetts Institute of Technology, USA). Consent was obtained within the Personal Genome Project (https://www.personalgenomes.org/). Details related to authentication and mycoplasma testing are denoted in the reporting summary. heX-embryoids are attached to a cell culture dish, lacking an extra-embryonic trophectoderm tissue critical for full integrated embryo development and implantation in the uterine cavity. The yolk sac-like cavity in these embryoids is not closed, and the tissues cannot be harvested for any implantation without substantial disruption of their structures. TBXT+ posterior-like domains were observed during the study but were not sustained during the development of our system. These features collectively restrict the ability of this model to undergo the full integrated development of a human embryo in vitro and/or implant to support further development of a conceptus in vivo. This study does not use human blastocysts, does not involve the derivation of any human embryonic stem cell lines and does not use samples obtained from fetal abortions. This study does not involve in utero transfer of any human cells or structures into any other species.

Cell culture

All cells and tissues were cultured in a humidified incubator at 37 °C and 5% CO2. Our hiPS cell lines were cultivated under sterile conditions in mTeSR-1 (StemCell Technologies), changed daily. Tissue culture plates were coated for 1 h at room temperature with BD ES-qualified Matrigel (BD Biosciences) diluted according to the manufacturer’s instructions in ice cold DMEM/F-12. Routine passaging was performed by incubating hiPS cell colonies for 5 min in Accutase (Sigma) at 37 °C, collecting the suspension and adding 5 ml of DMEM/F-12 medium containing 10 µM Y-27632, centrifuging at 300g for 5 min and resuspending in DMEM/F-12 supplemented with 10 µM Y-27632 for counting. Cells were seeded at a cell density of 25,000 cells per cm2 for routine maintenance.

GATA6-engineered cell line generation

The previously generated rtTA expressing PGP1 hiPS cells62 were transfected using Lipofectamine 3000 (Thermo Fisher Scientific) with Super PiggyBac Transposase (System Biosciences) and the PiggyBac transposon vector with human GATA6-2A-EGFP under control of the tetracycline responsive element promoter. Transfected cells were selected by adding 0.5 mg ml−1 puromycin to the mTeSR-1 maintenance medium. PGP9 iGATA6-hiPS cells were engineered as explained previously62. For the generation of high GATA6-expressing cells (GATA6-hi cell line), the iGATA6 cell line was sorted by using fluorescence-activated cell sorting (one cell per each well of a 96-well plate) in mTeSR-1 supplemented with 10 µM Y-27632 and 2 µM Thiazovivin. The media was replaced on the day after sorting with mTeSR-1 supplemented with 10 µM Y-27632 and 2 µM Thiazovivin. On day 3 after sorting, 125 µl of mTeSR-1 was added to each well. The wells were monitored afterwards for colony formation. On day 6, the wells with a considerable colony were passaged and the amount of GATA6-EGFP was characterized by Dox induction. The GATA6 concentrations were screened on the basis of a high level of EGFP reporter expression. For generating the supplemented cultures, 25% of iGATA6 cells were substituted for GATA6-hi cells at the time of seeding. The same ratios to WT cells were used as before.

Generation of heX-embryoids

The GATA6-engineered hiPS cells were seeded at either a ratio of 81/5 or 4/1 with rtTA expressing PGP1 hiPS cells either containing or lacking an mKate reporter gene at a total density of 54,000 cells per cm2 in mTeSR-1 supplemented with 10 µM Y-27632. The next day, the medium was changed to mTeSR-1 with 1 μg ml−1 Dox to induce expression of the GATA6 transgene, and this medium was used for daily replacement for up to 5 days. For experiments that continued beyond 5 days, medium was switched to IMDM on day 6. The following presented experiments were seeded at a 4/1 ratio before optimization: Figs. 1b,c, 2c,d and 3d,e, Extended Data Figs. 1f, 2b–d,g,h, 6a, 7d,e, 8d,e and 12c,d,f–i. The remaining experiments shown were seeded at 81/5. We did not predetermine sample sizes via statistical methods. The number of samples used in each experiment was selected to ensure data consistency and reproducibility, and was based on the available resources. Embryoid samples developed in each culture for each experiment were allocated to control and experimental groups in a random manner at seeding when different conditions (that is, pathway inhibition) were present. We did not conduct blinding.

Seeding of 3D heX-embryoids was performed in AggreWell 400 plates at 24,000 cells in total, targeting 12–20 cells per microwell. Media changes were performed as described above, with the following modification: after the initial media change to mTeSR containing Dox, media changes were performed in accordance with the protocol recommended by the manufacturer, in which 75% of the media is removed and replaced to prevent displacement of aggregates.

Cryostorage of heX-embryoids

Uninduced cells were incubated with Dispase at 70% confluence for 10 min, or until there was visible lifting of colony edges. Cells were washed twice with DMEM/F-12, then colonies were manually scraped from the plate. Colonies were centrifuged at 300g for 5 min, and then were resuspended in Cryostor 10. Cells were cooled at −80 °C for 24 h before transfer to liquid nitrogen for long-term storage. Cells were stored in liquid nitrogen for at least 24 h before defrosting.

Signalling pathway inhibition

BMP4 signalling was inhibited by application of 100 ng ml−1 Noggin into the normal media at day 2 of heX-embryoid culture. This concentration of Noggin was supplemented into the media changed on subsequent days.

Staining on glass coverslips

Cells were grown on Matrigel-coated 8 mm diameter circular glass coverslips, on 14 mm coverslip bottom dishes (Mattek Corporation) or on eight-well µ-Slide (ibidi). Cultures were fixed for 10 min in 4% paraformaldehyde (Electron Microscopy Sciences) at room temperature. Coverslips were then washed three times with PBS followed by 15 min permeabilization with 0.2% Triton X-100 in PBS. Subsequently the coverslips were washed three times in wash buffer (0.05% Tween-20 in PBS) for 5 min and blocked for 20 min in 200 ml of wash buffer plus 5% normal donkey serum (Jackson ImmunoResearch Laboratories). The primary antibodies were diluted in 5% normal donkey serum in PBS and incubated with the tissues for 1 h at room temperature followed by three washes in wash buffer for 5 min each. The secondary antibodies were diluted in 5% normal donkey serum in PBS and incubated with the tissues for 1 h at room temperature followed by three washes in wash buffer for 5 min each. Afterwards, the 8 mm coverslips were mounted on microscopy glass slides using ProLong Glass Antifade (Life Technologies), cured overnight at room temperature and then sealed with clear nail polish. Coverslips in coverslip bottom dishes or µ-slides were stored in PBS at 4 °C for 3D imaging.

Image acquisition and processing

Images were acquired using the EVOS M700 automated scanning microscope (M7000 Software Revision v.2.0.2094.0), Leica SP8 confocal microscope (Leica Application Suite X v.3.7.4), Sartorius Incucyte S3 Live Cell Imaging System (software v.v2019B) or Nikon A1 confocal microscope, and processed using Fiji/ImageJ software (National Institutes of Health, NIH)63. Any contrast adjustments were made in individual channels and applied evenly across the whole image in that channel. Contrast and colour balance for colour images were applied evenly across the whole image. Colours in Fig. 1b,c and Extended Data Fig. 2a were produced using the look-up tables published in ref. 64. 3D reconstructions were generated using the Nikon A1 confocal microscope to generate z-stacks spanning roughly 100 μm deep into the tissues and using Imaris (Bitplane) to construct a 3D volume from the stacks. Time-lapse videos were initially processed using Fiji/ImageJ, and annotations were added using the Adobe Premiere software. For Supplementary Video 4, a video moving through the z-slices was initially recorded performing this action in the NIS Elements HC software (Nikon), and the video was subsequently cropped and processed.

Analysis of WT cluster areas and radial expression

Tiled whole-coverslip images were cropped to a central circumscribed square (typically 9,000 μm2) image for analysis. WT compartment analysis was performed using an in-house MATLAB (Mathworks) pipeline (built in MATLAB v.R2020a and run in v.R2022b). In brief, WT compartments were detected programmatically by thresholding the nuclear dye, F-actin stain or mKate marker channel for areas of high signal intensity. Maximum individual compartment area and average cell size were defined manually. Compartments were filtered to remove those with areas close to the defined individual cell area by using the following equation:

$$\text{lower size limit}\,=\,\frac{\text{max compartment area}}{{10}^{(\frac{\text{max compartment area}-\text{single cell area}}{2})}}$$

For compartment counting and WT area analysis, characteristics of each compartment were recorded using the regionprops function in MATLAB. Furthermore, the distance to the nearest WT compartment was recorded.

Analysis of marker expression inward from the compartment perimeter was performed by drawing a line a defined distance towards each compartment’s centroid and recording the intensity value of each marker at each point within this range. If a line was drawn such that it would intersect the compartment edge (for example, drawing from the outer edge of a concave shape), those values were skipped. Intensity values recorded for each line were averaged to create final per-compartment expression distributions. These distributions were aligned on the basis of the point of maximum CER1 expression when present, and the lengths of the distributions were equalized to the length of the longest border recorded using the imresize function in MATLAB.

Analysis of WT cluster features

Coverage of WT clusters was evaluated by using widefield images taken of the entire coverslip. Areas in which EGFP expression observed inside the WT clusters was greater than 50% of the cluster area were counted as being covered. Presence or absence of lumens and/or cavities was evaluated manually, and was detected by the presence of visible rings of PODXL, ZO-1 or F-actin expression within WT compartments.

ISL1 expression within the WT disc was recorded by averaging the radial expression values obtained from the protocol above.

Polarity of TBXT and CER1 domains around WT compartments was assessed in the following way: raw values of CER1 and TBXT produced by radial cluster analysis were normalized to cell density by dividing by the F-actin intensity measured at each point. The rolling average of CER1 and T values at all points within one-eighth of the disc radius in either direction away from a given point around each disc was computed for every point recorded. Positive polarity for each marker was assessed by comparing the minimum and maximum rolling average values for a difference of greater than 0.1 normalized intensity (a.u.). For islands with positivity, the radial index of the highest averaged quadrant value of CER1 was compared to the radial index of the highest average quadrant value of TBXT: a WT compartment with an average peak of TBXT between 135° and 225° away from the CER1 peak was considered to be anti-polar; a WT compartment with a peak of TBXT within 45° of the CER1 peak was considered to be syn-polar. Polarity type assignments per-island were verified manually by eye, and islands with several poles or ambiguous polarization were excluded from analysis.

Analysis of haemogenic endothelial, haemoglobin and haematopoietic marker genes

Whole-coverslip IF images of each marker were cropped into a square, then divided into equivalent quadrants for ease of analysis. WT cluster masks were created by manual thresholding and creation of a binary image using Fiji/ImageJ. These binary images were then used to remove WT clusters from the images using the Image Calculator function applied to the individual channel images. The resulting images were evaluated by using a custom pipeline in CellProfiler65. Scatterplots shown were created using the mean object intensity values for each detected cell. CD43 bar graphs were produced by counts of the number of objects detected by the IdentifyPrimaryObjects function applied to the CD43 channel in isolation. The resulting images were evaluated by using a custom pipeline in CellProfiler65. Scatterplots shown were created using the mean object intensity values for each detected cell. CD43 bar graphs were produced by counts of the number of objects detected by the IdentifyPrimaryObjects function applied to the CD43 channel in isolation.

Haemoglobin expression was evaluated in a similar manner with the following additions: the IdentifyPrimaryObjects function was applied to both the haemoglobin ε and haemoglobin γ channels individually, then the objects were combined by using the RelateObjects function. To produce the scatterplot showing the expression of each marker, the mean object intensity values for each marker were divided by the mean object intensity value for a pan-haemoglobin stain on each coverslip. These normalized values were used for the creation of the graph. For the bar graph showing the distribution of ratios of the markers on each day, the mean object intensity value of haemoglobin ε was divided by the mean object intensity value for haemoglobin γ.

CD34+ area was assessed through Fiji/ImageJ. The cropping and WT cluster masking steps above were followed. Subsequently, the threshold command was used to select areas positive for CD34, and thresholded area was assessed by using the measure command. The area of the threshold was then divided by the total cropped image area to determine the percentage CD34 coverage for each image.

Haematopoietic foci were analysed manually by eye from tilescan images of individual stained wells from a 48-well plate. Clusters of more than 20 cells expressing the marker types assessed with a clearly negative peripheral border were manually partitioned as a single focus, then were evaluated on the basis of the markers expressed within the cells of that focus. A uni-lineage focus was identified if there were fewer than three cells within the focus that did not express the given marker. Because CD43 is a progenitor and pan-haematopoietic marker, cells within the uni-lineage focus that also expressed CD43 were still counted as uni-lineage so long as there was an absence of cells that were positive only for CD43 and negative for the marker of interest within that focus. Multilineage foci (expressing two or more markers) were identified if three or more of the cells expressing each of the given markers could be counted in each area.

Quantitative analysis of z-distribution of haematopoietic foci-related markers

Confocal z-stack images were captured near to the centre of each identified area. For each stack, the bottom was defined as the lowest z-index where an in-focus marker could be identified. The sum of the pixel intensity values from all pixels within each slice above the bottom index were summed, then divided by pixel number. The resulting values at each corresponding z-index were then averaged between all samples identified; these values were then converted to a percentage of the maximum value to produce the graphs shown.

Enzyme-linked immunosorbent assays

Samples were assayed for AFP and APOA1 using commercially available enzyme-linked immunosorbent assay kits (abcam). Sample dilutions were optimized to attain detection in the linear range of the standard curves for each individual assay.

RT–qPCR

Three independent cultures of embryoids per each day of collection were lysed with Trizol (Invitrogen) for RNA extraction. RNA extraction was done using Direct-zol RNA Miniprep Kit (Zymo Research). Following complementary DNA (cDNA) synthesis (Thermo Fisher), equal amounts of cDNA per sample were analysed for gene expression using QuantStudio 3 Real-Time PCR Systems (Applied Biosystems). 18S Ribosomal RNA was used as the endogenous control and the 2−ΔΔCT method was used for calculation of relative gene expression. The sequences for the used primers are shown in Supplementary Table 1.

10X Genomics sample preparation for next-generation sequencing

Samples were prepared as described by the 10X Genomics Cell Multiplexing Oligo Labelling for Single-Cell RNA Sequencing Protocols for cells with more than 80% viability. heX-embryoids were acquired in single-cell suspension by incubation with Accutase for 20 min at 37 °C. The cell suspension was passed through a 40 μm strainer to remove aggregates. Each sample suspension was then centrifuged at 300g for 5 min at room temperature. The supernatant was manually aspirated from each of the samples using a P1000 pipette. The cell pellets were each resuspended in 1 ml of PBS + 0.04% BSA, and the samples were centrifuged at 300g for 5 min at room temperature. The samples were then resuspended, targeting 1 × 106 cells per ml on the basis of expected densities of cells each day, and the cell suspension was passed through a 40 μm strainer again. Each cell suspension was counted using a hemocytometer, and a volume of cell suspension was removed as necessary to adjust the final cell count to 1 × 106 cells. The volumes were then adjusted to 1 ml by adding a further PBS + 0.04% BSA to replace the removed volumes.

For multiplexing oligo labelling

Samples were then centrifuged at 300g for 5 min at room temperature. Supernatant was aspirated manually with a P1000 pipette. Samples were resuspended in 50 μl of CellPlex Multiplexing Solution (10X Genomics), with unique multiplexing oligo solutions assigned to each sample. A maximum of 12 samples were labelled in parallel. Samples were incubated for 5 min, starting after the oligo solution was added to the last sample.

Following labelling, 1.95 ml of 1× PBS + 1% BSA was added to each sample, and the solution was mixed thoroughly. Each sample was centrifuged at 300g for 5 min at 4 °C. The supernatant was manually aspirated with a P1000 pipette, taking care to leave less than 10 μl of supernatant remaining in each sample when possible. The samples were resuspended in 2 ml of 1× PBS + 1% BSA and were mixed thoroughly to wash. This wash and centrifugation step was repeated twice more; at the last resuspension, enough wash buffer was added to reach a cell count of 1 × 106 cells per ml, assuming 50% cell loss from the count at the end of the first protocol. Labelled cell suspensions were then put on ice for transfer to the Pitt Single-Cell Core for library creation.

Following a final counting and viability analysis, cells and 10X Genomics reagents were loaded into the single-cell cassette, with a target of 25,000 single cells for analysis, accounting for predicted cell loss and doublets resulting from multiplexing as laid out in the user guide for the Chromium Single-Cell 3′ Reagent Kit (10X Genomics). After generation of gel in-bead emulsions, the cDNA library was prepared by Pitt Single-Cell Core staff following the appropriate steps determined by the 10X Genomics user guide. Libraries were sent to the UPMC Genome Centre for sequencing by a NovaSeq S4-200 for an intended read depth of 100,000 reads per cell with 150 bp paired end reads. Our downstream analysis from the sequencing data yielded between 40,000 and 150,000 mean reads per cell in different samples.

scRNA-seq sample processing and quality control

The 10X Genomics CellRanger pipeline was used to align reads to the reference genome (GRCh38.84) appended with transgene sequences, to assign reads to individual cells and to estimate gene expression on the basis of unique molecular identifier (UMI) counts.

We used CellRanger (v.6.1.2) ‘multi’ pipeline to demultiplex the libraries and quantify the gene expressions from raw fastq files. We first downloaded the FASTA file and GTF file of the human reference genome (GRCh38.84) and constructed the genome index by using the mkref command available in the CellRanger tool kit. This genome index was augmented by a few exogenous sequences (sequence names) to enable alignment against the exogenous sequences. We specified the number of expected cells in the config file as 24,000 for the batch containing the day 21 scRNA-seq and 30,000 for the batch containing the day 5 and lower scRNA-seq. The multi command in the CellRanger tool kit was then run to perform demultiplexing into GEX libraries and CellPlex libraries and quantification of gene counts. Other than the number of expected cells specified above, we used the default parameters of the multi command. We used the filtered gene count output for our downstream analyses (located under the CellRanger output folder ‘outs/per_sample_output/sample_name/count/sample_feature_bc_matrix’).

Seurat V4 was used for downstream processing of the scRNA-seq data66. Single-cell data were excluded on the basis of a high mitochondrial genome transcript ratio and either high or low feature or UMI counts. Genes with UMI counts in fewer than five cells were removed from consideration. For scRNA-seq data processing and cluster analysis using Seurat, we used the following general standardized pipeline for processing of the CellRanger output: SCTransform regressing percentage mitochondrial genes, principal component analysis and clustering. Jackstraw plots and permuted P values were used to assist in determining the optimal number of principal components needed to summarize the datasets without losing a significant amount of variation. The quality of a range of clustering resolution values was assessed using enrichment of cluster marker genes (genes differentially upregulated in a given cluster relative to all other clusters) with embryo cell type-specific genes. As a quality check, principal components and resolution metrics were modulated to yield fewer or extra clusters to confirm that chosen parameters resulted in the most biologically relevant clustering. Visualization was achieved by the use of uniform manifold approximation and projection (UMAP) plots identifying cells, clusters and selected gene expression in each cell, as well as heatmaps and violin plots showing the expression level of genes by cluster. Stacked violin plots were created using the scCustomize package in R67.

Subclustering the WT in the D3 and the endothelial cluster in the D21 dataset was achieved by subsetting the cluster of interest, then computing the distance between rows of the data matrix using Euclidean measurement (dist function in R). Hierarchical clustering was applied using this distance matrix (hclust in R), and used to create a dendrogram of the distance explained by different numbers of subclusters. The number of subclusters was selected on the basis of the highest level of distance observed and applied to the data (clustercut in R). The cluster was then added back to the rest of the dataset and markers were determined.

Gene set enrichment analysis

Gene set enrichment analysis was performed between cluster 5 and cluster 2 of the merged D0–D5. The differentially expressed markers in cluster 5 as compared to cluster 2 were obtained by using the FindMarkers() function in Seurat. All marker genes with a log2 fold change above 1 were used as an input to the Enrichr web server68,69. The top ten pathways identified in the KEGG 2021 and Reactome 2022 pathways were used. Top pathways were identified using the Combined Score, which was computed by the Enrichr site and was defined as the log of the P value from the Fisher exact text multiplied by the z-score of the deviation from the expected rank.

Comparison of embryoid culture clusters to yolk sac and definitive endoderm clusters

Generating Jaccard similarity-based scores

We compared the heX-embryoid cluster markers to yolk sac endoderm and proliferating definitive endoderm (DE(P)) with the following steps:

  1. 1.

    We denote the unique overlap of markers between the embryoid cluster and DE(P) cluster as A and the unique overlap of markers between the embryoid cluster and yolk sac endoderm cluster as B and marker genes for embryoid cluster as C; marker genes for DE(P) as D and marker genes for yolk sac endoderm as E.

  2. 2.

    We computed the Jaccard similarity indices (JSI): JSI for embryoid/DE(P) is j1 = A/(CD); JSI for embryoid/YS endoderm is j2 = B/(CE). The JSI basically adjusts the overlap size by the two marker cluster sizes.

  3. 3.

    We then measure the significance of j2 − j1, if j2 is significantly larger than j1, then we may say the unique overlap of embryoid/YS endoderm is significantly greater than the unique overlap of embryoid/DE(P).

  4. 4.

    To compute the significance level, we performed a random sampling of three sets of genes (of the same sizes as the clusters) from the genome, which generated an empirical distribution of j2 − j1. On the basis of this empirical distribution, we then computed the empirical P value for the significance and visualized the P value scores by −log10(P).

Volcano plot

We computed the expression fold changes of the yolk sac endoderm cluster with respect to the DE(P) cluster from ref. 23 using the function ‘FindMarkers()’ in Seurat package (v.4.0.1). The fold changes were then used to produce the volcano plots. Marker genes of each the embryoid clusters were labelled in the volcano plots.

Statistical analysis

Statistical tests were noted in the legend of each corresponding figure panel. For the similarity quantification between lists of marker genes, we quantify the similarity between two lists of marker genes A and B by using the hypergeometric test for overrepresentation. This is equivalent to a one-tailed Fisher’s exact test. Data obtained for comparison were taken from refs. 23,24,25,26.

For the statistical tests used in the bar graphs when calculating the P value for Extended Data Fig. 6f, we used an unpaired, two-tailed t-test.

The statistical test for HBE/HBG expression used to calculate the P value for the comparison of HBE/HBG expression level across days of culture in Fig. 5i was a one-way analysis of variance (ANOVA) test. Tukey’s test was used as the post hoc test for multiple comparisons.

In vitro CFU assays

Cells from day 21 heX-embryoids were dissociated into single-cell solution by application of Accutase (StemCell Technologies) for 30 min. CFU assays were performed using methylcellulose containing media (MethoCult SF H4636, StemCell Technologies) according to the manufacturer’s instructions. After 14 days in culture, colonies were analysed by image acquisition using light microscopy.

Flow cytometry analysis

To prepare a single-cell solution of heX-embryoid, the culture was treated for 45 min with Collagenase C solution (3 mg ml−1 StemCell Technologies) followed by 15 min of treatment with Accuatse (Sigma). FC block solution (Thermo Fischer) was added to the samples followed by 10 min of incubation on ice. Next, the antibody mix (final dilution of 1:400) was added to the samples followed by 30 min incubation on ice. Cells were analysed using an LSR II flow cytometer (BD Bioscience) using 7-AAD (BD Pharmingen) for dead cell staining. The antibodies used were CD34-APC (Clone 581, Biolegend), KIT-BV421 (Clone 104D2, BD Bioscience), CD43-PE (Clone CD43-10G7, Biolegend), CD235ab-PE-Cy7 (Clone HIR2, Biolegend), CD33-BV605 (Clone P67.6, Biolegend), CD42b-AF700 (Clone HIP1, Biolegend), CD7-PE-Cy7 (Clone 4H9/CD7, Biolegend), CD45-Pacific Blue (Clone HI30, Biolegend), CD45-APC-Cy7 (Clone HI30, Biolegend), CD45-APC (Clone HI30, Biolegend), CD31-PE-Cy7 (Clone WM59, Biolegend), CD15-PE (Clone HI98, Biolegend), CD56-PE-Cy7 (Clone MEM-188, Biolegend) and CD49d(VLA-4)-BV605 (Clone 9F10, Biolegend). Back-gating, controls and analysis steps can be seen in Supplementary Information 1 and 2.

Statistics and reproducibility

All experiments were repeated in at least three biological replicates with similar results, except where indicated in the legends or in the Methods.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.